Ontology learning from domain specific web documents

نویسندگان

  • Maryam Hazman
  • Samhaa R. El-Beltagy
  • Ahmed A. Rafea
چکیده

Ontologies play a vital role in many Web and internet related applications. However the demand on ontologies does not always meet with their availability and the manual construction of a high quality ontology is an expensive and time-consuming process. This work, presents a system for accelerating the ontology building process via semi-automatically learning a hierarchal ontology given a set of domain specific web documents and a set of seed concepts. The methods are tested with Web documents in the domain of Agriculture. The ontology is constructed through the use of two complementary approaches. The first approach utilizes the structure of phrases appearing in HTML headings while the second uses the hierarchical structure of the HTML headings for identifying new concepts and their taxonomical relationship between seed concepts and between each other. The presented system has been used to build an ontology in the agricultural domain using a set of Arabic extension documents. The resulting ontology was evaluated against a modified version of the AGROVOC ontology..

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Ontology Learning and Population: Bridging the Gap between Text and Knowledge

Ontology Learning is up to now dominated by techniques which use text as input. There are only few methods which use a different data source. The techniques which use highly structured data as input have the disadvantage that such data sources are rare. On the other side, there are enormous amounts of Web content present today. We present the XTREEM (Xhtml TREE Mining) methods which enable Onto...

متن کامل

Classification of Web Documents Using Concept Extraction from Ontologies

In this paper, we deal with the problem of analyzing and classifying web documents in a given domain by information filtering agents. We present the ontology-based web content mining methodology that contains such main stages as creation of ontology for the specified domain, collecting a training set of labeled documents, building a classification model in this domain using the constructed onto...

متن کامل

طراحی سامانه نیمه‌خودکار ساخت هستی‌شناسی به‌کمک تحلیل هم‌رخدادی واژگان و روش C-value (مطالعه موردی: حوزه علم‌سنجی ایران)

Ontology is one of formal concepts and the relations in the specific regions.It have recently tried to design the learning, automatic methods of Ontology. Whereas Ontology containing concepts and the relations, exploiting concepts, the semantic relations among concept. The various Ontology of regions and different applications are expensive processes that are automatic.The lack of main knowledg...

متن کامل

Automated ontology construction for unstructured text documents

Ontology is playing an increasingly important role in knowledge management and the Semantic Web. This study presents a novel episode-based ontology construction mechanism to extract domain ontology from unstructured text documents. Additionally, fuzzy numbers for conceptual similarity computing are presented for concept clustering and taxonomic relation definitions. Moreover, concept attributes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJMSO

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2009